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Abstract 

Consider the observation of n iid realizations of an experiment with d > 2 
possible outcomes, which corresponds to a single observation of a multinomial dis- 
tribution A4d(n, p) where p is an unknown discrete distribution on {1, . . . , d}. In 
many applications, the construction of a confidence region for p when n is small 
is crucial. This concrete challenging problem has a long history. It is well known 
that the confidence regions built from asymptotic statistics do not have good cov- 
erage when n is small. On the other hand, most available methods providing 
non-asymptotic regions with controlled coverage are limited to the binomial case 
d = 2. In the present work, we propose a new method valid for any d > 2. This 
method provides confidence regions with controlled coverage and small volume, and 
consists of the inversion of the "covering collection" associated with level-sets of the 
likelihood. The behavior when d/n tends to infinity remains an interesting open 
problem beyond the scope of this work. 

Keywords. Confidence regions, small samples, multinomial distribution. 
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1 Introduction 

Consider the observation of n iid realizations Y]_, . . . ,Y n of an experiment with d > 2 
possible outcomes with common discrete distribution pi5\ + • • • + pdSd on {!,... ,d}, 
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where 5 a denotes the Dirac mass at point a. This corresponds to a single observation 
X = (Xi, . . . , Xd) of the multinomial distribution 

M d (n,p)= Vp( k )$(.k u ...,k d ) where /x p (fc) ^^■•■ ^ ^ , 

0< fc^ , . . . <n 

fciH \-kd=n 

where p = (pi, . . . ,pd) and X^ = Cardjl < i <n such that Y{ = k} for every 1 < k < d. 
Here d is known, X is observed, and p is unknown. The present article deals with 
the problem of constructing a confidence region for p from the single observation X of 
•M-d(n, p), in the non- asymptotic situation where n is small. More precisely, let 

Ad = {(wi, . . . , u d ) G [0, l] d such that u\ H h = 1} 

be the simplex of probability distributions on {1, ... , d}. The observation X ~ Aid(n, p) 
lies in the discrete simplex 

Ed = {(xi, ■ ■ ■ , Xd) G {0, . . . , n} d such that X\ + • • • + Xd = n] . (1) 

From the single observation X and for some prescribed level a G (0, 1), we are interested 
in the construction of a random region i? a (X) C depending on X and a such that 

• the coverage probability has a prescribed lower bound 

P(p G R a (X}) > 1 - a (2) 



• the volume of R a (X.) in M. d is as small as possible. 

These two properties are the most important in practice. We propose to solve this problem 
by defining the "level-set" confidence region R a (X) C A^ given by 

R a (X) = {p G A d such that /i P (X) > u(p, a)} (3) 

where 

ii(p, a) = sup< u G [0, 1] such that fip(k) > 1 — ot >. 

^ k£E d > 

fi p (k)>u 

One can check that this confidence region ([3]) contains always the maximum likelihood 
estimator n _1 X of p. Moreover, this region can be easily computed numerically, i.e. for 
each value of p one may compute u(p, a) and compare it to /x p (X). Furthermore, it 
fulfills (j2J), and the numerical computations presented in Section [3] show that it has small 
volume and actual coverage often close to 1 — a at least for d = 2 and d = 3. In fact, this 
region is a special case of a generic method of construction based on covering collections. 
The concept of covering collections is presented in Section |2] and encompasses as another 
special case the classical Clopper-Pearson interval and its multivariate extensions. On the 
other hand, it is well known (see for instance Remark I2.6D that a natural correspondence 
via inversion exists between confidence regions with prescribed coverage and families of 
tests with prescribed level. However, this correspondence is a simple translation and does 
not give any clue to construct regions with small volume. 
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Two kinds of methods for the construction of a confidence region for p can be found 
in the literature (see for instance pm EEH El El [3, [23] for reviews). The first methods 
give confidence regions with small volume but fail to control the prescribed coverage 
(e.g. Bayesian methods with Jeffrey prior, Wald or Wilson score methods based on the 
Central Limit Theorem, Bootstrapped regions,. . . ), and the second control the prescribed 
coverage but have too large volume to be useful (e.g. concentration methods based on 
Hoeffding-Bernstein inequalities, Clopper- Pearson type methods, ...). Note that the 
discrete nature of the multinomial distribution produces a staircase effect which makes it 
difficult to construct non-asymptotic regions with coverage equal exactly to 1 — a. For a 
discussion of such aspects, we refer for instance to Agresti et al. [SI El [I]- In general, it 
seems reasonable to expect a coverage of at least 1 — a, without being too conservative, 
while maintaining the volume as small as possible. Here the term conservative means 
that the coverage is greater than 1 — a. Even when d = 2 and n is large but finite, the 
confidence regions built from asymptotic approaches based on the Central Limit Theorem 
have a poor and uncontrolled coverage. It is also the case for bootstrapped versions which 
only improve the coverage probability asymptotically (see [2H1 EB EES HH1 HS1 EZ])- For 
the binomial case d = 2, one of the best known method is due to Blyth & Still [8 J and 
combines various approaches. To our knowledge, the available methods for the general 
multinomial case d > 2 are unfortunately asymptotic or Bayesian, which explains their 
poor performances in terms of coverage or volume when n is small (see [2"ol [23| ffl \T7 \ |2"T] ) . 

The coverage of our region ([3} is strictly controlled since it fulfills ([2D whatever the 
values of d and n. However, this says nothing about the actual coverage and the actual 
mean volume. The comparisons presented in Section [3] suggest that our region for d = 2 
is comparable to the Blyth & Still region in terms of actual coverage and actual mean 
volume. For d — 3, the Blyth & Still method is no longer available, and our region 
seems to have an actual coverage close to the prescribed level while maintaining a volume 
comparable to the asymptotic region constructed with the score method based on the 
Central Limit Theorem. Section [3] provides two concrete examples, one for d = 3 and 
another one for d = 4 in relation to the x 2 -test. The article ends with a final discussion. 

2 Covering collections 

The aim of this section is to introduce the notion of covering collection, which allows 
confidence regions to be built in a general abstract space. Let us consider a random 
variable X : (ft, A) — > (E,Be) having a distribution where 9* G 6. For some 
a G (0, 1), we would like to construct a confidence region R a (X) for 9* with a coverage 
of at least (1 — a), from a single realization of X. In other words, 



Definition 2.1 (Covering collection). A covering collection of E is a collection of mea- 
surable events (A k ) k&! c C Be such that 

• K is totally ordered and has a minimal element and a maximal element; 

• if k < k' then A k C Ay with equality if and only if k = k'; 



F(9* e R a (X)) > 1-a. 



(4) 



• A 



-min(/C) 



and A- 



■max(/C) 



E. 
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For instance, for E — {0,1, ... , n}, the sequence of sets 

0, MO)}, {(7(0), (7(1)}, . . . , MO), (7(1), . . . , (7(n)} = £ 

is a covering collection of E for any permutation a of For E — R, the collection 
(A t ) tg 5 where R = IRU{— oo, +00} defined by A_oo = 0, A t = (—00, t] for every t G R, and 
v4 +00 = M is a covering collection of E. Many other choices are possible, like A t = [—t, +t] 
or A t = [t, +00). We can recognize the usual shapes of the confidence regions used in 
univariate Statistics. 

Theorem 2.2 (Confidence region associated with a covering collection). Let (A k ) ke /c be 
a covering collection of E, and kx be the smallest k G K, such that I 6 4. For every 
a G (0, 1), the region R a (X) defined below satisfies to (jl]). 

R a (X) = {9 G such that fi e (A kx ) > a} . (5) 

Proof. For every 6 G O, let k a {9) be the largest k G /C such that Ho(A k ) < a. With this 
definition of k a (-), we then have 

x G A ka ( d ) if and only if fie(A kx ) < a. 

Thus we have 

F(6* eR a (X))=F(fi d ,(A kx )>a) 
= F(X^A kam ) 

= 1- fig, {A ka{B ,)) 

>l-a. 

□ 

These confidence regions are highly dependent on the chosen covering collection 
{A k ) k( zjc. Each choice of covering collection gives a particular region R a (X). Note that a 
small value of kx gives a small set A kx and thus leads to a confidence region with a small 
volume. For instance, assume that we have two realizations Xi and Xi of X with k Xl < k X2 . 
For a given sequence (A k ) k( zic, we have C A kx2 and thus i? Q (xi) C R a (a^)- It is 
tempting to choose the covering collection {A k ) k& x, in such a way that kx is as small 
as possible. Unfortunately, with such a choice, the covering collection (A k ) k£ ic could be 
random and the coverage of the associated region could be less than the prescribed level 
1 — a. 

Note that the set A kx can be empty, which means that a confidence region cannot be 
built with the sequence (A k ) k& tc. In contrast, the case where A kx = E leads to the trivial 
region R a (X) = 6. In the case where A kx = {X}, we have fig(A kx ) = fig({X}), which 
is the likelihood of X at point 8, and the region R a (X) corresponds to the complement 
of a level-set of the likelihood. 

The following symmetrization lemma allows (for instance) the construction of two- 
sided confidence intervals from one-sided confidence intervals. We use it in Section 12.21 
to interpret the Clopper-Pearson confidence interval as a special case of the covering 
collection method. 
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Lemma 2.3 (Symmetrization). Consider a covering collection (A k )o< k < K of E. For every 
< k < k let us define A' k — E \ A K _ k . For any 9 G 0, any X ~ fig, and any a G (0, 1), 
we construct 

Ri a = jfl G 0; iMe(A kx ) > ~a\ and R'± a = jfl G 0; ^{A'yJ > J 
where k' x is built from (A' k )o< k < K as kx from (A k ) < k <n and A' k ,^ — E \ A kx -\. Then 

is a confidence region with coverage greater than or equal to 1 — a. 

Proof. We have iig(A kx ) + f^e(A' k ) = 1 + fi 9 ({X}) > 1 and thus R\ a and R\ have 

X 2 2 a 

disjoint complements. The conclusion follows now from a general fact: if R\ and R2 
are two confidence regions with a coverage of at least 1 — |a such that R% U R2 — E 
(equivalently R\ = Q\R± and R\ = 0\i?2 are disjoint), then R\ and i?2 are disjoint and 
thus Ri fl R 2 = U i?2) c 1S a confidence region with a coverage of at least 1 — a. □ 

Remark 2.4 (Discrete case and staircase effect). Let (Ak)k&ic be a covering collection of 
a finite set E. Due to staircase effects, the coverage of the confidence regions constructed 
from this covering collection cannot take arbitrary values in (0, 1). These staircase effects 
can be reduced by using a fully granular collection for which Card(/C) = Card(-E'). The 
term fully granular means that the elements of the collection are obtained by adding the 
points of E one by one. It is impossible to remove completely the staircase effects when 
E is discrete, while maintaining a prescribed lower bound on the coverage. 

Remark 2.5 (Reverse regions). For the region R a (X) = {9 G 0; fig(Ak x ) < 1 — a} we 
have 

F(R a ) = F(ji 9 (A kx ) < 1 - a) = ¥(X G A kl _ a ) = fi 9 (A kl _ a ) < 1 - a. 

Remark 2.6 (Link with tests). Let us recall briefly the correspondence between confidence 
regions and statistical tests (we refer to JE, Section 48] for further details). Consider a 
parametric model (fig)e € e with data space X . For any fixed 9 G 0, the test problem of 
H : 9 = 9 versus Hi : 9 ^ 9 with level a G (0, 1) corresponds to the construction of an 
acceptance region C a (9 ) C X such that 

M {C a {9 )) > 1 - a. 

The construction of a confidence region for 9q can be done by inversion (i.e. by collecting 
the values of 9q for which Hq is accepted). Namely, for every x G X, one can define the 
region R a (x) C by 

R a (x) = {9 G such that x G C a (9)}. 

Now if X ~ fig then 

% g R a (X)) = F(X G C a (9 )) = M o (Ca(0 Q )) > 1 - a. 



This shows that for any fixed 9q G 0, the set R a (X) C is a confidence region for 
9q when X ~ {ig . Conversely, if for every 9q G and every x G X one has a region 
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R a (x) C such that P(#o £ -RaPO) > 1 — a w/ien X ~ /^ , i/ien one can construct 
immediately a test for Hq : 9 = 9$ versus H\ \ 9 ^ 9q with acceptance region 

C a (9 ) = {x G X such that 9 G R a (x)}. 

Note that this correspondence between confidence regions and statistical tests can be ex- 
tended to the composite case H : 9 G 6o versus Hi : 9 ^ Q Q where ©o C 0. 

2.1 The level-sets regions 

In this section, we show that the "level-sets" confidence region (jHJ) is a special case of the 
covering collection method. It is easier to consider here a decreasing covering collection 
(the corresponding version of Theorem 12.21 is immediate). Let us consider a random 
variable X : (Q,A) — > (E,Be) with law fig* where 9* G O. For every u > and 9 G 0, 
let us define 

A(9, u) = {x G E such that fig{x) > u}. 

For every 9 G 0, the collection (A(6, u)) u >o is decreasing with A(9, 0) = E and there exists 
u max that can be equal to +oo such that A(9, u max ) = 0. Also, (A(6, u max - n)) u g[o, Umax ] 
is a covering collection of E. Next, define 

u(9, a) = sup {u G [0, n max ] such that fig(A(9, u)) > 1 — a} 

and 

K(9,a) = A(9,u(9,a)). 
We would like to construct a confidence region for 9* from the observation of X ~ . If 

i? Q (X) = {9 G such that X G a)} (6) 

then 

P (r G i? Q (X)) = P (X G AT(5* , a)) = a)) > 1 - a. 

This shows that i£ a (X) is a confidence region for ^* with a coverage of at least 1 — a. 
Let clarify the expression of the confidence region for the general multinomial case where 
X ~ -Md(n, p) with p G A d and d > 2. Here the value of p used for the observed data 
X plays the role of 9* . We have = A d , E = E d as described by (PQ), = -Md(n, 0), 
and n max = 1- For every a G (0, 1), the confidence region given by the level-sets method 
is expressed as in ([3]) given in the introduction. 

Optimality 

Let us focus on the case where E is a finite set. The confidence region constructed 
above is not optimal among the 1 — a conservative regions and thus could be improved 
by a more detailed analysis. Let us first note that by its very construction, for each 
9 G 0, K(9, a) is minimal with respect to its cardinality that is, a set B(9, a) does 
not exist so that fig(B(9,a)) > 1 — a and card(i?(6 ) , a)) < caxd(K (9 , a)) . However, in 
some circumstances, sets L(9, a) may exist with the same cardinality as K(9, a) so that 
fig(K(9,a)) > fig(L(9,a)) > 1 — a. The following theorem gives a condition that allows 
conservative sets to be built but with a coverage closer to 1 — a than the coverage of 
R a (X). For all a G [0, 1] and 9 G 0, let us denote j(9, a) = 1 — jig (K(9, a)) and let us 
note that 7(6*, a) < a. 
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Theorem 2.7. If for each 9 G @ £/iere exzsfo two subsets V(9,a) C K(9,a) and 
W (9, a) C E\K(9, a) with the same cardinality so that 

a - 7 (0, a) > ^ (F(0, a)) - ^ (W(0, a)) > 0, 

£/ien i/iere exists a set T a (X) ^ R a (X) so that 

1 - a < P (0* G T Q (X)) < P (0* G i? a (X)) . 

Proof. Let us consider the set L (0, a) = X (0, a) \V (0, a) {J W (0, a) and note that 
thanks to the conditions imposed for the sets V and W we have for all 9 G 6, 

1 - a < no (L(9, a)) < fi e (K(9, a)) . 

Now, with T a (X) = {9 G 0; X G L(0, a)} we have 

P(0*GT a (X)) = P (X G L{9* , a)) 

= P (X G X (0*, a) \ V (0*, a) U W (0*, a)) 

= 1-7 (0*, a) - no* (V (0*, a)) + (W (9*, a)) 

< l- 7 (0*,a). 

On the other hand, we have already seen that for all G 6, 

1 — a < fie (L(9, «)) • 

This last inequality holds true when 9 = 9* and thus 

l-a<ne- (L(6*, a)) = P (0* G T a (X)) . 

□ 

This theorem can be used to build less conservative confidence sets than R a (X). A 
convenient way to proceed is to take V(9, a) = {y} where y is such that 

N(y)= miri u e (z) 

and to iteratively try several sets W k as follows. Set W°(9, a) = 0, and at iteration k > 1, 
set W k (9, a) = {w k } and L k (9, a) = K (0, a)\V (0, a) \J W k (0, a) where 

Wk = arg max Ue(z). 

zeL^-^(e,a) 

This process is iterated until the set L k (9,a) is such that fid (L k (9,a)) — (1 — a) is 
non-negative and minimum. 

Since for G O there may exist x ^ y with (ie( x ) — f^eiu), there also may exist several 
sets (L l (9,a))i which have the same mass ng(L l (9,a)) = 1 — 8(9, a). Several confidence 
sets with the same coverage can thus be derived using these sets. A simple way to choose 
between these concurrent confidence sets is to adopt the one that optimizes a criterion 
such as having a minimum volume (for the Lebesgue measure). 
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2.2 The Clopper-Pearson regions 

Consider the binomial case d = 2 for which p = (pi, 1 — pi). The well known Clopper- 
Pearson interval for p\ relies on the exact distribution of X\ in the binomial case [131 [201 
It was considered for a long time as outstanding. This interval [L, U] is given by 

= inf {9 G [0, 1] such that ^ =Xl 0^(1 - 0)™ > H m 
= sup G [0, 1] such that J2Zo CM 1 - ) n ~* ^ M ■ 

It has been shown that the Clopper-Pearson interval is often conservative. Also, some 
continuity corrections have been proposed, and give the so called "mid-p interval" , see [5] 
for a review. This trick reduces the staircase effect but the coverage probability can be 
less than 1 — a. The Beta-Binomial correspondence (see Lemma [2.81 below) shows that 
the left and right limits L and R of the Clopper-Pearson confidence interval ([7]) are the 
\a and (1 — |a) quantiles of the Beta distribution Beta {X x \ n — Xx + 1). 

Lemma 2.8 (Beta-Binomial correspondence). If X ~ Binom(n,pi) with pi G [0,1] and 
< k < n and B ~ Beta(fc, n — k + 1) i/ien the following identity holds true. 

P(X > k) = F(B < Pl ). (8) 

Proof. We briefly recall here the classical proof (see j9l page 68]). Let U\, . . . , U n be iid 
uniform random variables on [0, 1] and < ■ • • < U( n ) be the reordered sequence. If 
we define V P1 = Ym=i ^{Ui<pi} then V Pl ~ Binom(n,pi) and U^) ~ Beta(/c, n — k + 1) and 
for every 1 < k < n, V Pl > k if and only if t/m < Pi- □ 

The confidence interval obtained by the level-sets method does not coincide with 
the classical Clopper-Pearson confidence interval. Let us show why the Clopper-Pearson 
confidence interval can be considered as a special case of the method based on covering 
collections. Recall that we are in the case where d = 2 and Xi ~ Binom(ra,p 1 ) for some 
unknown p x G [0, 1]. This can also be written (Xx, n — Xi) ~ A4 2 (n, (pi, 1 — pi)). The 
unidimensional nature of E = {0, . . . ,n} suggests the following two covering collections 
(Al)keE and (A\)k^E defined by A\ = and = 0, and for every < k < n, 

A l+i = {°> • • • , k ) and A l+i = {n-k,...,n}. 

Here K, = E for both the top-to-bottom and bottom-to-top sequences. The bottom-to-top 
sequence (A\)k^E leads to a (1 — a) one-sided confidence interval for p\ given by 

R\[X X ) = j# G [0,1] such that ^ (^je^l-e)^ >a| = [0,l7 a (Xi)] (9) 



where 



U a (x) = sup J 9 G [0, 1] such that ^ Q - 9) n ~ l > a 



i=0 

On the other hand, the top-to-bottom covering collection (A\) k( z E leads to a (1 — a) 
confidence interval of p\ given by 

/?,-, (A:,) = {9e [0. 1 such that J2 ( n l^'C - ^ " ^ = M A "' ): 1] (10) 



where 

L a (x) = sup { 9 E [0, 1] such that ( '] ) 0*(1 - 9)^ > a 



n 



By virtue of Lemma 12.31 we can combine the one-sided confidence intervals ([9]) and ( flOl) 
in order to obtain a two-sided (1 — a) confidence interval of pi, which is the two-sided 
interval 

RIJXJ^R* (X x ) = [L, (X^U, (X,)]. 

2 ' ' 2 £ £ 

We recognize the Clopper-Pearson interval (I7|). The discrete nature of E precludes the 
construction of a confidence interval of p\ with coverage exactly equal to 1 — a. Actually, 
the Clopper-Pearson interval is not exactly symmetric and there is no guaranty that 



P 



(p < Li a (Xi)) = P (p > UxJXi 



Our construction via a covering collection immediately provides an extension of the 
Clopper-Pearson interval in the general multinomial case where X ~ Ai d{n, p) with 
p G Ad and d > 2. This construction consists of labeling the elements of Ed (note that 
Card(-Ed) = ( n ~j 1 d ^ 1 )) and constructing the covering collection (A fc ) fcgAC which grows by 
adding the points one after the other. The choice of the total order on Ed is arbitrary 
when d > 2. Some additional constraints can help to reduce this choice. As advocated by 
Casella [12] for the binomial distribution, the proposed confidence region R a (X) should 
be equivariant, that is not sensitive to the order chosen to label the d categories of the 
multinomial distribution. 

Definition 2.9 (Equivariance). A confidence region R a (X) is equivariant when 

F(a(9*) e R a (a(X))) = P (6* G R a (X)) (11) 
for every permutation a of {1, ... ,d}. In other words, if and only if 

a(R Q (X)) = R a (a(X)). 
The following lemma gives a criterion of equivariance for covering collections. 

Theorem 2.10 (Equivariance criterion for covering collections). The confidence region 
R a (X.) constructed from a covering collection (Ak)k&K is equivariant if and only if Af. is 
invariant by permutation of coordinates for every k G K.. 

Proof. Let o be a permutation of {1, ... , d}, i = (ii, . . . , id) G E, and for every 9 G ©, 

cr{9) = (9 a{1) , . . . and cr(i) = (i a{1) , . . . , i a{d) ) . 

By invariance of A^ by permutation, we have X G Ak X G <r(Ak) and thus fcx = k a rx)- 
If 9 G a (R a (X)) then /i (7 -i(e)(Afc x ) > a. But, for every i G E, 

^-ifl)({i}) = MM*)})- 

If Ak is invariant by permutations, then for every i G Ak, we have <r(i) G Ak and 
consequently 

/V-i(<?)(A fc ) = /ie(o-(A fc )) = ne(A k ). 
Thus, 9 E o~ (R a (X)) if and only if /ig(Afc x ) = / U6»(^4fc CT(x) ) > «, that is 9 E R a (cr(X)). □ 
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Equivariance imposes a strong constraint on the covering collection. A large set A kyL 
gives a large confidence region. Since confidence regions with small volume are desirable, 
it is interesting, when E is discrete, to consider a covering collection (A k ) ke ic which grows 
by adding the points of E one after the other. Unfortunately, this method of construction 
is not compatible with equivariance: the A k cannot be invariant by permutations of 
coordinates. A weaker condition consists of the existence of a subsequence (A kl )i that 
is invariant by permutation of coordinates. An example of such a sequence for d = 3 is 
given in Figure [TJ 

Recall that when d = 2, the Beta-Binomial correspondence stated in Lemma 12.81 
provides a clear link between the quantiles of the Beta distribution and the Clopper- 
Pearson confidence interval. In fact, this can be seen as a special case of the Dirichlet- 
Multinomial correspondence valid for any d > 3 as stated in the following lemma. This 
makes a link between Clopper-Pearson regions and Bayesian regions constructed with a 
Jeffrey prior (see for instance [21]). However, the notion of coverage that we use in the 
present article is purely frequentist and does not fit with the Bayesian paradigm without 
serious distortions. 

Lemma 2.11 (Dirichlet-Multinomial correspondence). Let p e A d and k Q , k\, . . . , kd be 

such that ko = < k\ < ■ • • < kd-i <n<kd = n+ l. If 

X ~ A4 d {n, p) and D ~ Dirichlet^fci — k , k 2 — ki, . . . , kd — &<f-i) 
then the following identity holds true: 

P(Xi > k 1 ,X 1 + X 2 > k 2 , . . . ,X 1 + • • • + X d _x > k d ^) 

= P(£>i < Pl ,D 1 + D 2 <p 2 ,...,D 1 + --- + D d -i < p d -i). (12) 

Proof. The proof is a direct extension of the Beta-Binomial case given by Lemma [2781 Let 
Ji, . . . , Id be the sequence of adjacent sub-intervals of [0, 1] of respective lengths pi, . . . , p d , 
U\, . . . , U n be iid uniform random variables on [0, 1] and < • • • < be the reordered 
sequence. For any 1 < r < d, let us define 

n 

V PiT = ^^I{c/,Gir} = Card{l <i<n such that Ui G I r }. 

i=l 

We have V p = (V Pt i, . . . , V^ r ) ~ M.d{n, p). Now, for every < k\ < • ■ ■ < kd-\ < n, 

V P;t >fci,..., V v .\ H h V v ,d-\ > k d -\ iff U (kl) < pi, . . . , f/ (fcd _ l} <pi~\ h Pd-i- 

But by using the notation C/(o) = and t/( n +i) = 1, we have 

(f/ ( i) - C/( ), . . . , U (n +i) - U {n) ) ~ Dirichlet n+ i(l, . . . , 1). 

and therefore, by the stability of Dirichlet laws by sum of blocks, with k Q = and 

kd = n + 1, 

(U (kl) - U (ko) , U {kd) - U{k d _ x )) ~ Dirichlet d (A;i, k 2 - h, . . . , k d - k d -i). 

□ 
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3 Comparisons and examples 



Recall that for every fixed d > 2, n > 0, and p G A^, a confidence region obtained from 
X ~ M. (n, p) provides a single coverage probability and a distribution of volumes. In 
this section, we use coverage probabilities and mean volumes to compare the performance 
of our level-set method with other methods, in the case where d G {2,3} and n G 
{5,10,20,30}. We also give two concrete examples, one for d — 3 and another one 
for d — 4 in relation to the x 2 -test. It turns out that the regions obtained by the 
Clopper-Pearson method and its multinomial extension have non- competitive volumes so 
we decided to ignore them in the comparisons. 

3.1 Performances in the binomial case (d = 2) 

In the binomial case d — 2, a confidence region for p = (pi, 1 — pi) is actually a confidence 
interval for p% . It is well known that the Wald interval constructed from the Central Limit 
Theorem has poor coverage even when n is large but finite [10] . It is also widely accepted 
that the Wilson score interval [27[ [10] or the Blyth-Still interval [6J should be preferred 
to the Wald interval. We therefore compared the performances of the 95%-intervals 
provided by the level-sets method, the score method, and the Blyth-Still method. We 
computed the coverages and the mean widths of the intervals obtained with each method 
for n G {5, 10, 20, 30} and for all p\ G [0; 0.5]. The results are represented in figures [2] and 
[3] respectively. We can see that for some values of p±, the coverage of the score method is 
smaller than the prescribed level of 0.95, whereas the coverage of the Blyth-Still interval 
and the level-set interval are always greater than or equal to this prescribed level 0.95. 
The coverages obtained with the level-set method are always closer to the prescribed level 
except for n = 20, p% G [0.45, 0.48] and n = 30, p\ G [0.38, 0.42]. The differences between 
the coverages of these three methods decrease with n. 

Figures [2] and [3] show that the score method provides intervals with excellent mean 
width but fails to control the coverage. The level-set method gives intervals that have a 
slightly narrower mean width than the one obtained with the Blyth-Still method. This 
suggests that the level-set method provides an excellent alternative to the Blyth-Still 
method. Moreover, and in contrast to the Blyth-Still method, the level-set method can 
still be used when d > 2. 

3.2 Performances in the trinomial case (d = 3) 

To our knowledge, the Blyth-Still method has no counterpart for d > 2. In addition, the 
regions obtained by the extended Clopper-Pearson method have non-competitive volumes. 
We therefore decided to compare the level-set method with the natural multidimensional 
extension of the Wilson score method. We computed for d = 3 the coverage probabilities 
and the mean volumes of the 95%-regions obtained with both methods, for n G {5, 10, 20}. 
Note that for the score method, only the trace over A3 of the regions is used to compute 
the volume. The graphics in Figure H] show the coverage of both methods as well as the 
difference between their mean volumes. Whatever the sample size, the coverage of the 
level-set regions is very close to 1 — a = 0.95. In contrast, the coverages of the score 
regions can be much lower than 0.95. Surprisingly and in contrast with the binomial case 
(d — 2), the level-set method here provides confidence regions with mean volumes that 
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(for n — 5) are comparable to or smaller than their score's counterparts! We believe that 
this because we measure the performance by the mean volume. The level-set method 
appears thus to be a reasonable way to build small confidence sets. 

3.3 Concrete example of the trinomial case (d = 3) 

The present example concerns antibiotics efficacy A traditional way to evaluate whether 
or not an antibiotic can be used for a specific pathogen is to perform a "susceptibility 
testing". In such an experiment, different isolates of a given pathogen are classified as 
"Sensible", "Intermediate" or "Resistant" according to the antibiotics ability to stop 
their growth. Here, ten different isolates of Escherichia coli were tested with ampicillin. 
The following results were obtained : 8 isolates were Sensible, 2 Intermediate and were 
Resistant. The count x = (8, 2, 0) can be seen as the realization of X ~ Ai(10, p) where 
P — {pi, P2, P3) denotes the probability of a given isolate belonging to each of the different 
classes. We calculated a 95%-confidence region of p using the level-set method (Figure 
E]). This region suggests that even if none of the 10 tested isolates was observed to be 
resistant, up to 30% of resistant and 20% of intermediate isolates will be still possible. 
This confidence region does not contain the situation where all the isolates are sensible 
and it is thus unlikely that this antibiotic works all the time when it meets this pathogen. 

3.4 Concrete example of the quadrinomial case (d = 4) 

The present example is simply a x 2_ t es t for independence. It deals with the difference 
in behavior of male and female veterinary students with respect to smoking habits. The 
following result was observed in a group of 12 veterinary students in Toulouse: 





Smokers 


Non-smokers 


Female 


3 


8 


Male 


10 


5 



The x 2 -test rejects independence with a P-value 0.047 and suggests that more males 
than females smoke. This P-value is close to the critical threshold of 0.05 and was 
obtained with a small sample size. Therefore, one can question whether this result can 
be trusted. A possible solution is to build a confidence region. The table above can be 
seen as the realization x = (3, 8, 10, 5) of a multinomial random variable X ~ M(26, p) 
with p = {pi,P2,P3,P4)- If the smoking habit and the gender are independent then p 
belongs to 

Hq = {q € A 4 such that q = (uv, (1 — u)v, u{l — v), (1 —u)(l — v)) and (w, v) G [0, l] 2 } . 

Since p 4 = 1 — pi — p 2 — P3, one can draw a graphic with only P1.P2.P3- Figure [U] shows 
(in green) the 95% confidence region for p built with the level-set method. The surface 
corresponds to the null hypothesis Hq. The red area is the acceptance region of the x 2 - 
test. It turns out that p = (3/26,8/26, 10/26) does not belong to the acceptance region 
of the x 2- test. However, the 95%-region for p obtained with the level-set method cuts 
Hq. Therefore, according to Remark l2.6l and in contrast to the result given by the x 2 -test, 
the independence hypothesis is not rejected. 
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The 95% level-set confidence region provides the following 95% confidence interval 
for the odd-ratio: [0.024; 1.712]. On the other hand, the inversion of Fisher's exact test 
gives the 93.7% interval [0.187; 2.625]. This suggests that the level-set approach is less 
conservative, probably due to the fact that Fisher's exact conditions on row and column 
totals increases the discreteness of the problem. 

4 Final discussion 

The general concept of "covering collection" allows the construction of confidence regions 
with controlled coverage, including the classical Clopper-Pearson interval for the bino- 
mial and its multinomial extensions. The covering collection construction involves an 
arbitrary growing collection of sets in the data space. Our "level-set" confidence regions 
are obtained by using a special collection based on level-sets of the data distribution. 
The level-set regions for the multinomial parameter can be easily computed for any d 
and n. It turns out that they have excellent coverage probabilities and mean volumes for 
d G {2, 3} and n < 30. They are in particular competitive with the famous Blyth-Still 
intervals for d = 2. Also, we recommend the level-set method, even if it can be compu- 
tationally expensive when d is large. The behavior of these confidence regions when the 
ratio d/n tends to infinity is a very interesting open problem. In this extreme case, the 
observation X is sparse and belongs to the boundary of the observation simplex E^. Note 
that the critical n for which X ~ M. (n, p) belongs to the interior of Ed corresponds to 
the classical "coupon collector problem" [T5l [22j [19] . Another interesting open problem 
is the optimality of the level-set regions related to the control of P(p' 6 R a (X.)) with 
X ~ A4(n,p) and p ^ p'. It might be also interesting to extend the level-set method to 
more complex situations such as hierarchical log-linear models for instance. 

Acknowledgements 

The present version of this article has greatly benefited from the comments and criticism 
of an Associate Editor and three anonymous referees. 

References 

[1] A. Agresti. Dealing with discreteness: making 'exact' confidence intervals for pro- 
portions, differences of proportions, and odds ratios more exact. Stat. Methods Med. 
Res., 12(1):3-21, 2003. 

[2] A. Agresti and B. Caffb. Simple and effective confidence intervals for proportions 
and differences of proportions result from adding two successes and two failures. 
Amer. Statist., 54(4):280-288, 2000. 

[3] A. Agresti and B. A. Coull. Approximate is better than "exact" for interval estima- 
tion of binomial proportions. Amer. Statist., 52(2):119-126, 1998. 

[4] J. Albert. Pseudo-Bayes estimation of multinomial proportions. Comm. Statist. 
A — Theory Methods, 10(16):1587-1611, 1981. 



13 



[5] G. Berry and P. Armitage. Mid-p confidence intervals: a brief review. The Statisti- 
cian, 44(4):417-423, 1995. 

[6] C. R. Blyth. Approximate binomial confidence limits. J. Amer. Statist. Assoc., 
81(395):843-855, 1986. 

[7] C. R. Blyth. Correction: "Approximate binomial confidence limits" [J. Amer. Statist. 
Assoc. 81 (1986), no. 395, 843-855]. J. Amer. Statist. Assoc., 84(406):636, 1989. 

[8] C. R. Blyth and H. A. Still. Binomial confidence intervals. J. Amer. Statist. Assoc., 
78(381):108-116, 1983. 

[9] A. A. Borovkov. Mathematical statistics. Gordon and Breach Science Publishers, 
Amsterdam, 1998. Translated from the Russian by A. Moullagaliev and revised by 
the author. 

[10] L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial 
proportion. Statist. Sci., 16(2):101-133, 2001. With comments and a rejoinder by 
the authors. 

[11] L. D. Brown, T. T. Cai, and A. DasGupta. Confidence intervals for a binomial 
proportion and asymptotic expansions. Ann. Statist., 30(1):160-201, 2002. 

[12] G. Casella. Refining binomial confidence intervals. Canad. J. Statist., 14(2):113-129, 
1986. 

[13] X. Chen, K. Zhou, and J. L. Aravena. On the binomial confidence interval and 
probabilistic robust control. Automatica J. IFAC, 40(10):1787-1789 (2005), 2004. 

[14] C. J. Clopper and E. S. Pearson. The use of confidence or fiducial limits illustrated 
in the case of the binomial. Biometrika, 26:404-413, 1934. 

[15] W. Feller. An introduction to probability theory and its applications. Vol. I. Third 
edition. John Wiley & Sons Inc., New York, 1968. 

[16] M. Garci'a-Perez. Exact finite-sample significance and confidence regions for 
goodness-of-fit statistics in one-way multinomials. British Journal of Mathemati- 
cal and Statistical Psychology, 53:193-207, 2000. 

[17] J. Glaz and C. P. Sison. Simultaneous confidence intervals for multinomial propor- 
tions. J. Statist. Plann. Inference, 82(1-2) :251-262, 1999. Multiple comparisons (Tel 
Aviv, 1996). 

[18] P. Hall. The bootstrap and Edgeworth expansion. Springer Series in Statistics. 
Springer- Verlag, New York, 1992. 

[19] L. Hoist. On birthday, collectors', occupancy and other classical urn problems. 
Internal Statist. Rev., 54(1): 15-27, 1986. 

[20] S. A. Julious. Two-sided confidence intervals for the single proportion: comparison 
of seven methods, letter to the editor. Statist. Med., 24:3383-3384, 2005. 



14 



[21] D. Morales, L. Pardo, and L. Santamaria. Bootstrap confidence regions in multino- 
mial sampling. Appl. Math. Comput., 155(2):295-315, 2004. 

[22] R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, 
Cambridge, 1995. 

[23] R. G. Newcombe. Two-sided confidence intervals for the single proportion: compar- 
ison of seven methods. Statist. Med., 17:857-872, 1998. 

[24] C. P. Robert. The Bayesian choice. Springer Texts in Statistics. Springer- Verlag, 
New York, second edition, 2001. From decision-theoretic foundations to compu- 
tational implementation, Translated and revised from the French original by the 
author. 

[25] C. P. Sison and J. Glaz. Simultaneous confidence intervals and sample size deter- 
mination for multinomial proportions. J. Amer. Statist. Assoc., 90(429):366-369, 
1995. 

[26] A. Ullah, A. T. K. Wan, and A. Chaturvedi, editors. Handbook of applied economet- 
rics and statistical inference, volume 165 of Statistics: Textbooks and Monographs. 
Marcel Dekker Inc., New York, 2002. 

[27] E. B. Wilson. Probable inference, the law of succession, and statistical inference. J. 
Amer. Statist., 22:209-212, 1927. 

[28] X.-D. Zheng and W.-Y. Loh. Bootstrapping binomial confidence intervals. J. Statist. 
Plann. Inference, 43(3):355-380, 1995. 



Djalil Chafai, CORRESPONDING AUTHOR, |d . chaf ai [0] env t . f r| 

UMR181 INRA, ENVT, Ecole Nationale Veterinaire de Toulouse 
23 Chemin des Capelles, F-31076 Cedex 3, Toulouse, France. 

UMR 5219 CNRS, Institut de Mathematiques, Universite de Toulouse 
118 route de Narbonne, F-31062 Cedex 4, Toulouse, France. 



15 



Figure 1: The construction of Ak when d — 3, with Aq = and A\ = {(n, 0,0)}. The 
point in A\ is at the beginning of the starting arrow represented as a dotted line. Each 
time the arrow meets a point in the simplex, this point is added to Ak to give A^+i- The 
set obtained with the three first arrows is invariant by permutation of coordinates. 




Figure 2: Binomial case d = 2. The curves are the mean width of the 95%-intervals 
obtained with the Blyth-Still method (thick line), the level-set method (thin line) and 
the score method (dotted line) for pi G [0,0.5]. The Blyth-Still method gives intervals 
with higher mean width irrespective of p\. The score method always gives intervals with 
smaller width. Note that the score method fails to control the coverage probability. As 
n increases, the differences between the mean widths of the respective intervals decrease. 
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Figure 3: Binomial case d — 2. These curves are the coverage of the 95%-intervals 
obtained with the Blyth-Still method (thick line), the level-set method (thin line) and 
the score method (dotted line) for pi 6 [0,0.5]. The score method fails to control the 
coverage. The level-set method seems (nearly) uniformly better than the Blyth-Still 
method: its coverages are closer to 0.95. When n increases, the differences between these 
three methods decrease. 
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Figure 4: Trinomial case d = 3. The columns give the coverages of the level-set method, 
the coverages given by the score method and the difference of mean volumes. The three 
rows correspond to n G {5, 10, 20}. For the coverages graphs (first two columns), a clear 
color means that the coverage is close to 0.95 whereas a dark blue color means that the 
coverage is smaller than 0.85. For the volumes graphs (third column), a white color 
means that the difference of mean volumes is small whereas the blue, pink and yellow 
colors are used when the mean volume of the regions obtained with the level-set method 
are smaller than their counterpart obtained with the score method. 
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Figure 5: Trinomial case d = 3 (example 13. 3j) . In barycentric coordinates, the 95%- 
region for p is constructed from the observation x = (0,2,8) of A^IO, p). Note that 
the Wald method cannot be used here since the observation belongs to the boundary of 
the observation simplex E%. In this example, the score and the level-set methods give 
approximately the same region. 
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Figure 6: Quadrinomial case d — 4 (example 13 .41) . The axes correspond to pi, p 2 , and p 3 . 
The null hypothesis H of the x 2 -test is represented by the surface. The set in red is the 
acceptance region of the x 2 -test. The region in green is the 95%-region for p built with 
the level-set method. It turns out that p = (3/26, 8/26, 10/26)does not belong to the 
acceptance region of the x 2 ~^ es ^ while it belongs to the 95%-region for p built with the 
level-set method. Additionally, since this confidence region cuts H , the corresponding 
test does not reject H , in contrast to the x 2 -test. 
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